Support 3D Weights in GPTQ Algorithm #3835
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Changes
The approach here is quite straightfoward,
_quantize_weightsworks as usual for 2D weights. The difference is in calculate hessian where the hessian is 3D in both 3D and 2D weights case. By default hessian has the shape (1, hidden_dim, hidden_dim).Before this was just (hidden_dim, hidden_dim). For 3D, it is (num_experts/batch, hidden_dim, hidden_dim).
Now, this 3D hessian or "batched" hessian is looped over and the 2D weight is extracted and passed to the old
_quantize_weightsfunction as usual and scale/zp are returned. These scales and zp are then stacked together in a collector variable. For 2D case, it is flattened. For 3D the stacked scale, zp are returned.NOTE: Scale Estimation + GPTQ support is not added for 3D weights yet
Reason for changes
Support 3D weights for models like MoE in GPTQ
Related tickets
175789 & 175212
Tests